Efficient HPSG Parsing with Supertagging and CFG-Filtering
نویسندگان
چکیده
An efficient parsing technique for HPSG is presented. Recent research has shown that supertagging is a key technology to improve both the speed and accuracy of lexicalized grammar parsing. We show that further speed-up is possible by eliminating non-parsable lexical entry sequences from the output of the supertagger. The parsability of the lexical entry sequences is tested by a technique called CFG-filtering, where a CFG that approximates the HPSG is used to test it. Those lexical entry sequences that passed through the CFG-filter are combined into parse trees by using a simple shift-reduce parsing algorithm, in which structural ambiguities are resolved using a classifier and all the syntactic constraints represented in the original grammar are checked. Experimental results show that our system gives comparable accuracy with a speed-up by a factor of six (30 msec/sentence) compared with the best published result using the same grammar.
منابع مشابه
HPSG Supertagging: A Sequence Labeling View
Supertagging is a widely used speed-up technique for deep parsing. In another aspect, supertagging has been exploited in other NLP tasks than parsing for utilizing the rich syntactic information given by the supertags. However, the performance of supertagger is still a bottleneck for such applications. In this paper, we investigated the relationship between supertagging and parsing, not just to...
متن کاملCompiling an HPSG-based grammar into more than one CFG
Recently, the performance of HPSG parsing has been improved so that the parsers can be applied to real-world texts. CFG filtering is one of the techniques which contributed to this progress. It improved parsing speed by filtering impossible parse trees by using the CFG compiled from a given HPSGbased grammar. However, there is a limit in the speed-up. This is because the compiled CFG grows into...
متن کاملA log-linear model with an n-gram reference distribution for accurate HPSG parsing
This paper describes a log-linear model with an n-gram reference distribution for accurate probabilistic HPSG parsing. In the model, the n-gram reference distribution is simply defined as the product of the probabilities of selecting lexical entries, which are provided by the discriminative method with machine learning features of word and POS n-gram as defined in the CCG/HPSG/CDG supertagging....
متن کاملEfficient HPSG Parsing Algorithm with Array Unification
This paper presents a method for improving parsing performance of parsers for HPSG. The method was obtained by extending Torisawa’s parsing method for HPSG. His parsing method utilizes a CFG compiled from a given HPSG-based grammar, and the parser predicts the possible parse trees with the CFG. Since the amount of unification is reduced because of this prediction, parsing performance is improve...
متن کاملEfficacy of Beam Thresholding, Unification Filtering and Hybrid Parsing in Probabilistic HPSG Parsing
We investigated the performance efficacy of beam search parsing and deep parsing techniques in probabilistic HPSG parsing using the Penn treebank. We first tested the beam thresholding and iterative parsing developed for PCFG parsing with an HPSG. Next, we tested three techniques originally developed for deep parsing: quick check, large constituent inhibition, and hybrid parsing with a CFG chun...
متن کامل